智能论文笔记

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Aran Komatsuzaki , Joan Puigcerver , James Lee-Thorp , Carlos Riquelme Ruiz , Basil Mustafa , Joshua Ainslie , Yi Tay , Mostafa Dehghani , Neil Houlsby

分类：机器学习 | 自然语言处理 | 计算机视觉

2022-12-09

Training large, deep neural networks to convergence can be prohibitively expensive. As a result, often only a small selection of popular, dense models are reused across different contexts and tasks. Increasingly, sparsely activated models, which seek to decouple model size from computation costs, are becoming an attractive alternative to dense models. Although more efficient in terms of quality and computation cost, sparse models remain data-hungry and costly to train from scratch in the large scale regime. In this work, we propose sparse upcycling -- a simple way to reuse sunk training costs by initializing a sparsely activated Mixture-of-Experts model from a dense checkpoint. We show that sparsely upcycled T5 Base, Large, and XL language models and Vision Transformer Base and Large models, respectively, significantly outperform their dense counterparts on SuperGLUE and ImageNet, using only ~50% of the initial dense pretraining sunk cost. The upcycled models also outperform sparse models trained from scratch on 100% of the initial dense pretraining computation budget.

translated by 谷歌翻译

A hybrid Bayesian network for medical device risk assessment and management

Joshua Hunte , Martin Neil , Norman Fenton

分类：机器学习 | 人工智能

2022-09-07

ISO 14971是用于医疗设备风险管理的主要标准。尽管它指定了医疗设备风险管理的要求，但并未指定执行风险管理的特定方法。因此，医疗设备制造商可以自由开发或使用任何适当的方法来管理医疗设备的风险。最常用的方法，例如故障树分析（FTA），无法为计算风险估计有限或没有可用的历史数据或数据对数据存在二阶不确定性时提供合理的依据。在本文中，我们使用混合贝叶斯网络（BNS）提出了一种新颖的医疗设备风险管理方法，该方法解决了经典方法（例如FTA）的局限性，并结合了影响医疗设备风险的相关因素。提出的BN方法是通用的，但可以按系统的基础进行实例化，我们将其应用于除颤器设备，以证明生产和后期生产过程中医疗设备风险管理所涉及的过程。该示例已根据现实世界数据进行验证。

translated by 谷歌翻译

Developing Optimal Causal Cyber-Defence Agents via Cyber Security Simulation

Alex Andrew , Sam Spillard , Joshua Collyer , Neil Dhir

分类：机器学习 | (统计)机器学习

2022-07-25

在本文中，我们通过将新颖的网络安全模拟器与（因果关系）通过优化统一统一，探索网络安全辩护。特别注意最近发表的方法：动态因果贝叶斯优化（DCBO）。我们建议，当提供模拟网络的视图以及红色代理如何在该网络中传播的因果模型时，DCBO可以充当蓝色代理。为了研究DCBO如何对主机节点执行最佳干预措施，以降低红色代理引起的入侵成本。通过此，我们证明了一个完整的网络模拟系统，我们用来生成DCBO的观察数据并提供数值定量结果，从而为未来的工作奠定了基础。

translated by 谷歌翻译

Product safety idioms: a method for building causal Bayesian networks for product safety and risk assessment

Joshua Hunte , Martin Neil , Norman Fenton

分类：人工智能

2022-06-05

成语是小的，可重复使用的贝叶斯网络（BN）片段，代表不确定推理的通用类型。本文展示了如何使用成语来构建用于使用数据和知识组合的产品安全和风险评估的因果BN。我们表明，我们引入的特定产品安全习惯足以建立完整的BN模型，以评估各种产品的安全性和风险。即使有限（或没有）产品测试数据，安全调节器和产品制造商也可以使用最终的模型。

translated by 谷歌翻译

A Large-Scale Database for Graph Representation Learning

Scott Freitas , Yuxiao Dong , Joshua Neil , Duen Horng Chau

分类：机器学习 | 人工智能 | 计算机视觉

2020-11-16

随着图形表示学习的快速出现，新的大规模数据集的构建是区分模型能力，并准确评估每种技术的优点和弱点。通过仔细分析现有的图形数据库，我们确定了3个关键组件，对于推进图形表示学习领域迄今为止，没有单图数据库提供所有这些所需的属性。我们介绍了Malnet，是有史以来的最大的公共图表数据库，代表了恶意软件功能调用图的大规模本体。 MALNET包含超过120万个图表，平均超过15k节点和35k边缘，横跨47种类型和696个系列的层次结构。与流行的Reddit-12K数据库相比，MARNET提供了105倍的图表，平均较大的图39倍，课程更多63倍。我们对Malnet进行了详细的分析，讨论了其性质和出处，以及最先进的机器学习和图形神经网络技术的评估。 MARNET的前所未有的规模和多样性提供了令人兴奋的机会，推动图形表示学习的前沿 - 使新发现和研究进入不平衡的分类，解释性和阶级硬度的影响。数据库在www.mal-net.org上公开提供。

translated by 谷歌翻译

A Data-Adaptive Prior for Bayesian Learning of Kernels in Operators

Neil K. Chada , Quanjun Lang , Fei Lu , Xiong Wang

分类： (统计)机器学习 | 机器学习

2022-12-29

Kernels are efficient in representing nonlocal dependence and they are widely used to design operators between function spaces. Thus, learning kernels in operators from data is an inverse problem of general interest. Due to the nonlocal dependence, the inverse problem can be severely ill-posed with a data-dependent singular inversion operator. The Bayesian approach overcomes the ill-posedness through a non-degenerate prior. However, a fixed non-degenerate prior leads to a divergent posterior mean when the observation noise becomes small, if the data induces a perturbation in the eigenspace of zero eigenvalues of the inversion operator. We introduce a data-adaptive prior to achieve a stable posterior whose mean always has a small noise limit. The data-adaptive prior's covariance is the inversion operator with a hyper-parameter selected adaptive to data by the L-curve method. Furthermore, we provide a detailed analysis on the computational practice of the data-adaptive prior, and demonstrate it on Toeplitz matrices and integral operators. Numerical tests show that a fixed prior can lead to a divergent posterior mean in the presence of any of the four types of errors: discretization error, model error, partial observation and wrong noise assumption. In contrast, the data-adaptive prior always attains posterior means with small noise limits.

translated by 谷歌翻译

Search, Structure, and Sentiment: A Comparative Analysis of Network Opinion in Different Query Types on Twitter

Joshua Midha

分类：自然语言处理

2022-12-25

Understanding the relationship between structure and sentiment is essential in highlighting future operations with online social networks. More specifically, within popular conversation on Twitter. This paper provides a development on the relationship between the two variables: structure, defined as the composition of a directed network, and sentiment, a quantified value of the positive/negative connotations of a conversation. We highlight thread sentiment to be inversely proportional to the strength and connectivity of a network. The second portion of this paper highlights differences in query types, specifically how the aforementioned behavior differs within four key query types. This paper focuses on topical, event-based, geographic, and individual queries as orientations which have differing behavior. Using cross-query analysis, we see that the relationship between structure and sentiment, though still inversely proportional, differs greatly across query types. We find this relationship to be the most clear within the individual queries and the least prevalent within the event-based queries. This paper provides a sociological progression in our understanding of opinion and networks, while providing a methodological advancement for future studies on similar subjects.

translated by 谷歌翻译

Temporally Layered Architecture for Adaptive, Distributed and Continuous Control

Devdhar Patel , Joshua Russell , Francesca Walsh , Tauhidur Rahman , Terrance Sejnowski , Hava Siegelmann

分类：神经与进化计算 | 人工智能 | 机器学习

2022-12-25

We present temporally layered architecture (TLA), a biologically inspired system for temporally adaptive distributed control. TLA layers a fast and a slow controller together to achieve temporal abstraction that allows each layer to focus on a different time-scale. Our design is biologically inspired and draws on the architecture of the human brain which executes actions at different timescales depending on the environment's demands. Such distributed control design is widespread across biological systems because it increases survivability and accuracy in certain and uncertain environments. We demonstrate that TLA can provide many advantages over existing approaches, including persistent exploration, adaptive control, explainable temporal behavior, compute efficiency and distributed control. We present two different algorithms for training TLA: (a) Closed-loop control, where the fast controller is trained over a pre-trained slow controller, allowing better exploration for the fast controller and closed-loop control where the fast controller decides whether to "act-or-not" at each timestep; and (b) Partially open loop control, where the slow controller is trained over a pre-trained fast controller, allowing for open loop-control where the slow controller picks a temporally extended action or defers the next n-actions to the fast controller. We evaluated our method on a suite of continuous control tasks and demonstrate the advantages of TLA over several strong baselines.

translated by 谷歌翻译

Towards Scalable Physically Consistent Neural Networks: an Application to Data-driven Multi-zone Thermal Building Models

Loris Di Natale , Bratislav Svetozarevic , Philipp Heer , Colin Neil Jones

分类：机器学习 | 人工智能

2022-12-23

With more and more data being collected, data-driven modeling methods have been gaining in popularity in recent years. While physically sound, classical gray-box models are often cumbersome to identify and scale, and their accuracy might be hindered by their limited expressiveness. On the other hand, classical black-box methods, typically relying on Neural Networks (NNs) nowadays, often achieve impressive performance, even at scale, by deriving statistical patterns from data. However, they remain completely oblivious to the underlying physical laws, which may lead to potentially catastrophic failures if decisions for real-world physical systems are based on them. Physically Consistent Neural Networks (PCNNs) were recently developed to address these aforementioned issues, ensuring physical consistency while still leveraging NNs to attain state-of-the-art accuracy. In this work, we scale PCNNs to model building temperature dynamics and propose a thorough comparison with classical gray-box and black-box methods. More precisely, we design three distinct PCNN extensions, thereby exemplifying the modularity and flexibility of the architecture, and formally prove their physical consistency. In the presented case study, PCNNs are shown to achieve state-of-the-art accuracy, even outperforming classical NN-based models despite their constrained structure. Our investigations furthermore provide a clear illustration of NNs achieving seemingly good performance while remaining completely physics-agnostic, which can be misleading in practice. While this performance comes at the cost of computational complexity, PCNNs on the other hand show accuracy improvements of 17-35% compared to all other physically consistent methods, paving the way for scalable physically consistent models with state-of-the-art performance.

translated by 谷歌翻译

A Topic Modeling Approach to Classifying Open Street Map Health Clinics and Schools in Sub-Saharan Africa

Joshua W. Anderson , Luis Iñaki Alberro Encina , Tina George Karippacheril , Jonathan Hersh , Cadence Stringer

分类：机器学习

2022-12-22

Data deprivation, or the lack of easily available and actionable information on the well-being of individuals, is a significant challenge for the developing world and an impediment to the design and operationalization of policies intended to alleviate poverty. In this paper we explore the suitability of data derived from OpenStreetMap to proxy for the location of two crucial public services: schools and health clinics. Thanks to the efforts of thousands of digital humanitarians, online mapping repositories such as OpenStreetMap contain millions of records on buildings and other structures, delineating both their location and often their use. Unfortunately much of this data is locked in complex, unstructured text rendering it seemingly unsuitable for classifying schools or clinics. We apply a scalable, unsupervised learning method to unlabeled OpenStreetMap building data to extract the location of schools and health clinics in ten countries in Africa. We find the topic modeling approach greatly improves performance versus reliance on structured keys alone. We validate our results by comparing schools and clinics identified by our OSM method versus those identified by the WHO, and describe OSM coverage gaps more broadly.

translated by 谷歌翻译